Dynamic random-access memory ( dynamic RAM or DRAM) is a type of random-access semiconductor memory that stores each bit of data in a memory cell. A DRAM memory cell usually consists of a tiny capacitor and a transistor, both typically based on metal–oxide–semiconductor (MOS) technology.
While most DRAM memory cell designs use a capacitor and transistor, some only use two transistors. In the designs where a capacitor is used, the capacitor can either be charged or discharged; these two states are taken to represent the two values of a bit, conventionally called 0 and 1. The electric charge on the capacitors gradually leaks away; without intervention, the data on the capacitor would soon be lost. To prevent this, DRAM requires an external memory refresh circuit which periodically rewrites the data in the capacitors, restoring them to their original charge. This refresh process is the defining characteristic of dynamic random-access memory, in contrast to static random-access memory (SRAM) which does not require data to be refreshed. Unlike flash memory, DRAM is volatile memory (as opposed to non-volatile memory), since it loses its data quickly when power is removed. However, DRAM does exhibit limited data remanence.
DRAM typically takes the form of an integrated circuit chip, which can consist of dozens to billions of DRAM memory cells. DRAM chips are widely used in digital electronics where low-cost, high-capacity computer memory is required. One of the largest applications for DRAM is the main memory (colloquially called the RAM) in modern and (where the main memory is called the graphics memory). It is also used in many portable devices and video game consoles. In contrast, SRAM, which is faster and more expensive than DRAM, is typically used where speed is of greater concern than cost and size, such as the CPU cache in processors.
The need to refresh DRAM demands more complicated circuitry and timing than SRAM. This complexity is offset by the structural simplicity of DRAM memory cells: only one transistor and a capacitor are required per bit, compared to four or six transistors in SRAM. This allows DRAM to reach very high densities with a simultaneous reduction in cost per bit. Refreshing the data consumes power, causing a variety of techniques to be used to manage the overall power consumption. For this reason, DRAM usually needs to operate with a memory controller; the memory controller needs to know DRAM parameters, especially memory timings, to initialize DRAMs, which may be different depending on different DRAM manufacturers and part numbers.
DRAM had a 47% increase in the price-per-bit in 2017, the largest jump in 30 years since the 45% jump in 1988. In 2018, a "key characteristic of the DRAM market is that there are currently only three major suppliers — Micron Technology, SK Hynix and Samsung Electronics" that are "keeping a pretty tight rein on their capacity".
Other manufacturers make and sell but not the DRAM chips in them, such as Kingston Technology, and some manufacturers sell stacked DRAM (used e.g. in the fastest on the exascale) separately such as Viking Technology. Others sell such integrated into other products, such as Fujitsu into its CPUs, AMD in GPUs, and Nvidia, with HBM2 in some of their GPU chips.
In November 1965, Toshiba introduced a bipolar dynamic RAM for its electronic calculator Toscal BC-1411. In 1966, Tomohisa Yoshimaru and Hiroshi Komikawa from Toshiba applied for a Japanese patent of a memory circuit composed of several transistors and a capacitor, in 1967 they applied for a patent in the US.
The earliest forms of DRAM mentioned above used bipolar transistors. While it offered improved performance over magnetic-core memory, bipolar DRAM could not compete with the lower price of the then-dominant magnetic-core memory. Capacitors had also been used for earlier memory schemes, such as the drum of the Atanasoff–Berry Computer, the Williams tube and the Selectron tube.
MOS DRAM chips were commercialized in 1969 by Advanced Memory Systems, Inc., of Sunnyvale, CA. This 1024 bit chip was sold to Honeywell, Raytheon, Wang Laboratories, and others. The same year, Honeywell asked Intel to make a DRAM using a three-transistor cell that they had developed. This became the Intel 1102 in early 1970. However, the 1102 had many problems, prompting Intel to begin work on their own improved design, in secrecy to avoid conflict with Honeywell. This became the first commercially available DRAM, the Intel 1103, in October 1970, despite initial problems with low yield until the fifth revision of the photomask. The 1103 was designed by Joel Karp and laid out by Pat Earhart. The masks were cut by Barbara Maness and Judy Garcia. MOS memory overtook magnetic-core memory as the dominant memory technology in the early 1970s.
The first DRAM with multiplexed row and column address bus was the Mostek MK4096 4 Kbit DRAM designed by Robert Proebsting and introduced in 1973. This addressing scheme uses the same address pins to receive the low half and the high half of the address of the memory cell being referenced, switching between the two halves on alternating bus cycles. This was a radical advance, effectively halving the number of address lines required, which enabled it to fit into packages with fewer pins, a cost advantage that grew with every jump in memory size. The MK4096 proved to be a very robust design for customer applications. At the 16 Kbit density, the cost advantage increased; the 16 Kbit Mostek MK4116 DRAM, introduced in 1976, achieved greater than 75% worldwide DRAM market share. However, as density increased to 64 Kbit in the early 1980s, Mostek and other US manufacturers were overtaken by Japanese DRAM manufacturers, which dominated the US and worldwide markets during the 1980s and 1990s.
Early in 1985, Gordon Moore decided to withdraw Intel from producing DRAM. By 1986, many, but not all, United States chip makers had stopped making DRAMs. Micron Technology and Texas Instruments continued to produce them commercially, and IBM produced them for internal use.
In 1985, when 64K DRAM memory chips were the most common memory chips used in computers, and when more than 60 percent of those chips were produced by Japanese companies, semiconductor makers in the United States accused Japanese companies of export dumping for the purpose of driving makers in the United States out of the commodity memory chip business. Prices for the 64K product plummeted to as low as 35 cents apiece from $3.50 within 18 months, with disastrous financial consequences for some U.S. firms. On 4 December 1985 the US Commerce Department's International Trade Administration ruled in favor of the complaint.
Synchronous dynamic random-access memory (SDRAM) was developed by Samsung. The first commercial SDRAM chip was the Samsung KM48SL2000, which had a capacity of 16Mebibit, and was introduced in 1992. The first commercial DDR SDRAM (double data rate SDRAM) memory chip was Samsung's 64Mbit DDR SDRAM chip, released in 1998.
Later, in 2001, Japanese DRAM makers accused Korean DRAM manufacturers of dumping.
In 2002, US computer makers made claims of DRAM price fixing.
The long horizontal lines connecting each row are known as word-lines. Each column of cells is composed of two bit-lines, each connected to every other storage cell in the column (the illustration to the right does not include this important detail). They are generally known as the + and − bit lines.
A sense amplifier is essentially a pair of cross-connected inverters between the bit-lines. The first inverter is connected with input from the + bit-line and output to the − bit-line. The second inverter's input is from the − bit-line with output to the + bit-line. This results in positive feedback which stabilizes after one bit-line is fully at its highest voltage and the other bit-line is at the lowest possible voltage.
Some systems refresh every row in a burst of activity involving all rows every 64 ms. Other systems refresh one row at a time staggered throughout the 64 ms interval. For example, a system with 213 = 8,192 rows would require a staggered refresh rate of one row every 7.8 μs which is 64 ms divided by 8,192 rows. A few real-time systems refresh a portion of memory at a time determined by an external timer function that governs the operation of the rest of a system, such as the vertical blanking interval that occurs every 10–20 ms in video equipment.
The row address of the row that will be refreshed next is maintained by external logic or a counter within the DRAM. A system that provides the row address (and the refresh command) does so to have greater control over when to refresh and which row to refresh. This is done to minimize conflicts with memory accesses, since such a system has both knowledge of the memory access patterns and the refresh requirements of the DRAM. When the row address is supplied by a counter within the DRAM, the system relinquishes control over which row is refreshed and only provides the refresh command. Some modern DRAMs are capable of self-refresh; no external logic is required to instruct the DRAM to refresh or to provide a row address.
Under some conditions, most of the data in DRAM can be recovered even if the DRAM has not been refreshed for several minutes.
| + Asynchronous DRAM typical timing |
| Random read or write cycle time (from one full /RAS cycle to another) |
| Access time: /RAS low to valid data out |
| /RAS low to /CAS low time |
| /RAS pulse width (minimum /RAS low time) |
| /RAS precharge time (minimum /RAS high time) |
| Page-mode read or write cycle time (/CAS to /CAS) |
| Access time: Column address valid to valid data out (includes address setup time before /CAS low) |
| Access time: /CAS low to valid data out |
| /CAS low pulse width minimum |
When such a RAM is accessed by clocked logic, the times are generally rounded up to the nearest clock cycle. For example, when accessed by a 100 MHz state machine (i.e. a 10 ns clock), the 50 ns DRAM can perform the first read in five clock cycles, and additional reads within the same page every two clock cycles. This was generally described as timing, as bursts of four reads within a page were common.
When describing synchronous memory, timing is described by clock cycle counts separated by hyphens. These numbers represent in multiples of the DRAM clock cycle time. Note that this is half of the data transfer rate when double data rate signaling is used. JEDEC standard PC3200 timing is with a 200 MHz clock, while premium-priced high performance PC3200 DDR DRAM DIMM might be operated at timing.
| + Synchronous DRAM typical timing !rowspan=2 colspan=2 | Description |
|
|
The capacitor has two terminals, one of which is connected to its access transistor, and the other to either ground or VCC/2. In modern DRAMs, the latter case is more common, since it allows faster operation. In modern DRAMs, a voltage of +VCC/2 across the capacitor is required to store a logic one; and a voltage of −VCC/2 across the capacitor is required to store a logic zero. The resultant charge is , where Q is the charge in and C is the capacitance in .
Reading or writing a logic one requires the wordline be driven to a voltage greater than the sum of VCC and the access transistor's threshold voltage (VTH). This voltage is called VCC pumped (VCCP). The time required to discharge a capacitor thus depends on what logic value is stored in the capacitor. A capacitor containing logic one begins to discharge when the voltage at the access transistor's gate terminal is above VCCP. If the capacitor contains a logic zero, it begins to discharge when the gate terminal voltage is above VTH.
The capacitor in the stacked capacitor scheme is constructed above the surface of the substrate. The capacitor is constructed from an oxide-nitride-oxide (ONO) dielectric sandwiched in between two layers of polysilicon plates (the top plate is shared by all DRAM cells in an IC), and its shape can be a rectangle, a cylinder, or some other more complex shape. There are two basic variations of the stacked capacitor, based on its location relative to the bitline—capacitor-under-bitline (CUB) and capacitor-over-bitline (COB). In the former, the capacitor is underneath the bitline, which is usually made of metal, and the bitline has a polysilicon contact that extends downwards to connect it to the access transistor's source terminal. In the latter, the capacitor is constructed above the bitline, which is almost always made of polysilicon, but is otherwise identical to the COB variation. The advantage the COB variant possesses is the ease of fabricating the contact between the bitline and the access transistor's source as it is physically close to the substrate surface. However, this requires the active area to be laid out at a 45-degree angle when viewed from above, which makes it difficult to ensure that the capacitor contact does not touch the bitline. CUB cells avoid this, but suffer from difficulties in inserting contacts in between bitlines, since the size of features this close to the surface are at or near the minimum feature size of the process technology (Kenner, pp. 33–42).
The trench capacitor is constructed by etching a deep hole into the silicon substrate. The substrate volume surrounding the hole is then heavily doped to produce a buried n+ plate with low resistance. A layer of oxide-nitride-oxide dielectric is grown or deposited, and finally the hole is filled by depositing doped polysilicon, which forms the top plate of the capacitor. The top of the capacitor is connected to the access transistor's drain terminal via a polysilicon strap (Kenner, pp. 42–44). A trench capacitor's depth-to-width ratio in DRAMs of the mid-2000s can exceed 50:1 (Jacob, p. 357).
Trench capacitors have numerous advantages. Since the capacitor is buried in the bulk of the substrate instead of lying on its surface, the area it occupies can be minimized to what is required to connect it to the access transistor's drain terminal without decreasing the capacitor's size, and thus capacitance (Jacob, pp. 356–357). Alternatively, the capacitance can be increased by etching a deeper hole without any increase to surface area (Kenner, p. 44). Another advantage of the trench capacitor is that its structure is under the layers of metal interconnect, allowing them to be more easily made planar, which enables it to be integrated in a logic-optimized process technology, which have many levels of interconnect above the substrate. The fact that the capacitor is under the logic means that it is constructed before the transistors are. This allows high-temperature processes to fabricate the capacitors, which would otherwise degrade the logic transistors and their performance. This makes trench capacitors suitable for constructing embedded DRAM (eDRAM) (Jacob, p. 357). Disadvantages of trench capacitors are difficulties in reliably constructing the capacitor's structures within deep holes and in connecting the capacitor to the access transistor's drain terminal (Kenner, p. 44).
In 1T DRAM cells, the bit of data is still stored in a capacitive region controlled by a transistor, but this capacitance is no longer provided by a separate capacitor. 1T DRAM is a "capacitorless" bit cell design that stores data using the parasitic body capacitance that is inherent to silicon on insulator (SOI) transistors. Considered a nuisance in logic design, this floating body effect can be used for data storage. This gives 1T DRAM cells the greatest density as well as allowing easier integration with high-performance logic circuits since they are constructed with the same SOI process technologies.
Refreshing of cells remains necessary, but unlike with 1T1C DRAM, reads in 1T DRAM are non-destructive; the stored charge causes a detectable shift in the threshold voltage of the transistor. Performance-wise, access times are significantly better than capacitor-based DRAMs, but slightly worse than SRAM. There are several types of 1T DRAMs: the commercialized Z-RAM from Innovative Silicon, the TTRAM
The horizontal wire, the wordline, is connected to the gate terminal of every access transistor in its row. The vertical bitline is connected to the source terminal of the transistors in its column. The lengths of the wordlines and bitlines are limited. The wordline length is limited by the desired performance of the array, since propagation time of the signal that must traverse the wordline is determined by the RC time constant. The bitline length is limited by its capacitance (which increases with length), which must be kept within a range for proper sensing (as DRAMs operate by sensing the charge of the capacitor released onto the bitline). Bitline length is also limited by the amount of operating current the DRAM can draw and by how power can be dissipated, since these two characteristics are largely determined by the charging and discharging of the bitline.
The DRAM cells that are on the edges of the array do not have adjacent segments. Since the differential sense amplifiers require identical capacitance and bitline lengths from both segments, dummy bitline segments are provided. The advantage of the open bitline array is a smaller array area, although this advantage is slightly diminished by the dummy bitline segments. The disadvantage that caused the near disappearance of this architecture is the inherent vulnerability to noise, which affects the effectiveness of the differential sense amplifiers. Since each bitline segment does not have any spatial relationship to the other, it is likely that noise would affect only one of the two bitline segments.
This architecture is referred to as folded because it takes its basis from the open array architecture from the perspective of the circuit schematic. The folded array architecture appears to remove DRAM cells in alternate pairs (because two DRAM cells share a single bitline contact) from a column, then move the DRAM cells from an adjacent column into the voids.
The location where the bitline twists occupies additional area. To minimize area overhead, engineers select the simplest and most area-minimal twisting scheme that is able to reduce noise under the specified limit. As process technology improves to reduce minimum feature sizes, the signal to noise problem worsens, since coupling between adjacent metal wires is inversely proportional to their pitch. The array folding and bitline twisting schemes that are used must increase in complexity in order to maintain sufficient noise reduction. Schemes that have desirable noise immunity characteristics for a minimal impact in area is the topic of current research (Kenner, p. 37).
The problem can be mitigated by using redundant memory bits and additional circuitry that use these bits to detect and correct soft errors. In most cases, the detection and correction are performed by the memory controller; sometimes, the required logic is transparently implemented within DRAM chips or modules, enabling the ECC memory functionality for otherwise ECC-incapable systems. The extra memory bits are used to record RAM parity and to enable missing data to be reconstructed by error-correcting code (ECC). Parity allows the detection of all single-bit errors (actually, any odd number of wrong bits). The most common error-correcting code, a SECDED Hamming code, allows a single-bit error to be corrected and, in the usual configuration, with an extra parity bit, double-bit errors to be detected.
Recent studies give widely varying error rates with over seven orders of magnitude difference, ranging from , roughly one bit error, per hour, per gigabyte of memory to one bit error, per century, per gigabyte of memory.Borucki, "Comparison of Accelerated DRAM Soft Error Rates Measured at Component and System Level", 46th Annual International Reliability Physics Symposium, Phoenix, 2008, pp. 482–487Bianca Schroeder et al. (2009). "DRAM errors in the wild: a large-scale field study" . Proceedings of the Eleventh International Joint Conference on Measurement and Modeling of Computer Systems, pp. 193–204. The Schroeder et al. 2009 study reported a 32% chance that a given computer in their study would suffer from at least one correctable error per year, and provided evidence that most such errors are intermittent hard rather than soft errors and that trace amounts of radioactive material that had gotten into the chip packaging were emitting alpha particles and corrupting the data. A 2010 study at the University of Rochester also gave evidence that a substantial fraction of memory errors are intermittent hard errors. Large scale studies on non-ECC main memory in PCs and laptops suggest that undetected memory errors account for a substantial number of system failures: the 2011 study reported a 1-in-1700 chance per 1.5% of memory tested (extrapolating to an approximately 26% chance for total memory) that a computer would have a memory error every eight months.
This property can be used to circumvent security and recover data stored in the main memory that is assumed to be destroyed at power-down. The computer could be quickly rebooted, and the contents of the main memory read out; or by removing a computer's memory modules, cooling them to prolong data remanence, then transferring them to a different computer to be read out. Such an attack was demonstrated to circumvent popular disk encryption systems, such as the open source TrueCrypt, Microsoft's BitLocker Drive Encryption, and Apple's FileVault. 080222 citp.princeton.edu This type of attack against a computer is often called a cold boot attack.
This interface provides direct control of internal timing: when is driven low, a cycle must not be attempted until the sense amplifiers have sensed the memory state, and must not be returned high until the storage cells have been refreshed. When is driven high, it must be held high long enough for precharging to complete.
Although the DRAM is asynchronous, the signals are typically generated by a clocked memory controller, which limits their timing to multiples of the controller's clock cycle.
For completeness, we mention two other control signals which are not essential to DRAM operation, but are provided for the convenience of systems using DRAM:
The refresh cycles are distributed across the entire refresh interval in such a way that all rows are refreshed within the required interval. To refresh one row of the memory array using only refresh (ROR), the following steps must occur:
This can be done by supplying a row address and pulsing low; it is not necessary to perform any cycles. An external counter is needed to iterate over the row addresses in turn. In some designs, the CPU handled RAM refresh. The Zilog Z80 is perhaps the best known example, as it has an internal row counter R which supplies the address for a special refresh cycle generated after each instruction fetch. In other systems, especially , refresh was handled by the video circuitry as a side effect of its periodic scan of the frame buffer.
Page mode DRAM was in turn later improved with a small modification which further reduced latency. DRAMs with this improvement are called fast page mode DRAMs ( FPM DRAMs). In page mode DRAM, the chip does not capture the column address until is asserted, so column access time (until data out was valid) begins when is asserted. In FPM DRAM, the column address can be supplied while is still deasserted, and the main column access time ( tAA) begins as soon as the address is stable. The signal is only needed to enable the output (the data out pins were held at high-Z while was deasserted), so time from assertion to data valid ( tCAC) is greatly reduced. Fast page mode DRAM was introduced in 1986 and was used with the Intel 80486.
Static column is a variant of fast page mode in which the column address does not need to be latched, but rather the address inputs may be changed with held low, and the data output will be updated accordingly a few nanoseconds later.
Nibble mode is another variant in which four sequential locations within the row can be accessed with four consecutive pulses of . The difference from normal page mode is that the address inputs are not used for the second through fourth edges but are generated internally starting with the address supplied for the first edge. The predictable addresses let the chip prepare the data internally and respond very quickly to the subsequent pulses.
To be precise, EDO DRAM begins data output on the falling edge of but does not disable the output when rises again. Instead, it holds the current output valid (thus extending the data output time) even as the DRAM begins decoding a new column address, until either a new column's data is selected by another falling edge, or the output is switched off by the rising edge of . (Or, less commonly, a change in , , or .)
This ability to start a new access even before the system has received the preceding column's data made it possible to design memory controllers which could carry out a access (in the currently open row) in one clock cycle, or at least within two clock cycles instead of the previously required three. EDO's capabilities were able to partially compensate for the performance lost due to the lack of an L2 cache in low-cost, commodity PCs. More expensive notebooks also often lacked an L2 cache due to size and power limitations, and benefitted similarly. Even for systems with an L2 cache, the availability of EDO memory improved the average memory latency seen by applications over earlier FPM implementations.
Single-cycle EDO DRAM became very popular on video cards toward the end of the 1990s. It was very low cost, yet nearly as efficient for performance as the far more costly VRAM.
Although BEDO DRAM showed additional optimization over EDO, by the time it was available the market had made a significant investment towards synchronous DRAM, or SDRAM. Even though BEDO RAM was superior to SDRAM in some ways, the latter technology quickly displaced BEDO.
The and inputs no longer act as strobes, but are instead, along with , part of a 3-bit command:
| + SDRAM Command summary ! ! ! ! ! Address ! Command |
| Command inhibit (no operation) |
| No operation |
| Burst Terminate: stop a read or write burst in progress. |
| Read from currently active row. |
| Write to currently active row. |
| Activate a row for read and write. |
| Precharge (deactivate) the current row. |
| Auto refresh: refresh one row of each bank, using an internal counter. |
| Load mode register: address bus specifies DRAM operation mode. |
The line's function is extended to a per-byte DQM signal, which controls data input (writes) in addition to data output (reads). This allows DRAM chips to be wider than 8 bits while still supporting byte-granularity writes.
Many timing parameters remain under the control of the DRAM controller. For example, a minimum time must elapse between a row being activated and a read or write command. One important parameter must be programmed into the SDRAM chip itself, namely the CAS latency. This is the number of clock cycles allowed for internal operations between a read command and the first data word appearing on the data bus. The Load mode register command is used to transfer this value to the SDRAM chip. Other configurable parameters include the length of read and write bursts, i.e. the number of words transferred per read or write command.
The most significant change, and the primary reason that SDRAM has supplanted asynchronous RAM, is the support for multiple internal banks inside the DRAM chip. Using a few bits of bank address that accompany each command, a second bank can be activated and begin reading data while a read from the first bank is in progress. By alternating banks, a single SDRAM device can keep the data bus continuously busy, in a way that asynchronous DRAM cannot.
Some DRAM components have a self-refresh mode. While this involves much of the same logic that is needed for pseudo-static operation, this mode is often equivalent to a standby mode. It is provided primarily to allow a system to suspend operation of its DRAM controller to save power without losing data stored in DRAM, rather than to allow operation without a separate DRAM controller as is in the case of mentioned PSRAMs.
An EDRAM variant of PSRAM was sold by MoSys under the name 1T-SRAM. It is a set of small DRAM banks with an SRAM cache in front to make it behave much like a true SRAM. It is used in Nintendo GameCube and Wii video game consoles.
Cypress Semiconductor's HyperRAM is a type of PSRAM supporting a JEDEC-compliant 8-pin HyperBus or Octal xSPI interface.
|
|